home *** CD-ROM | disk | FTP | other *** search
- CHAPTER 1
-
- AN INTRODUCTION TO COMPILED BASIC
-
- This chapter explores the internal workings of the BASIC compiler. Many
- people view a compiler simply as a "black box" which magically transforms
- BASIC source files into executable code. Of course, magic does not play
- a part in any computer program, and the BC compiler that comes with
- Microsoft BASIC is no exception. It is merely a program that processes
- data in the same way any other program would. In this case, the data is
- your BASIC source code.
- You will learn here what the BASIC compiler does, and how it does it.
- You will also get an inside glimpse at some of the decisions a compiler
- must make, as it transforms your code into the assembly language commands
- the CPU will execute. By truly understanding the compiler's role, you will
- be able to exploit its strengths and also avoid its weaknesses.
-
-
- COMPILER FUNDAMENTALS
- =====================
-
- No matter what language a program is written in, at some point it must be
- translated into the binary codes that the PC's processor can understand.
- Unlike BASIC commands, the CPU within every PC is capable of acting on only
- very rudimentary instructions. Some typical examples of these instructions
- are "Add 3 to the value stored in memory location 100", and "Compare the
- value stored at address 4012 to the number -12 and jump to the code at
- address 2015 if it is less". Therefore, one very important value of a
- high-level language such as BASIC is that a programmer can use meaningful
- names instead of memory addresses when referring to variables and
- subroutines. Another is the ability to perform complex actions that
- require many separate small steps using only one or two statements.
- As an example, when you use the command PRINT X% in a program, the value
- of X% must first be converted from its native two-byte binary format into
- an ASCII string suitable for display. Next, the current cursor location
- must be determined, at which point the characters in the string are placed
- into the screen's memory area. Further, the cursor position has to be
- updated, to place it just past the digits that were printed. Finally, if
- the last digit happened to end up at the bottom-right corner of the screen,
- the display must also be scrolled up a line. As you can see, that's an
- awful lot of activity for such a seemingly simple statement!
- A compiler, then, is a program that translates these English-like BASIC
- source statements into the many separate and tiny steps the microprocessor
- requires. The BASIC compiler has four major responsibilities, as shown in
- Figure 1-1 below.
-
-
- 1. Translate BASIC statements into an equivalent series of assembly
- language commands.
-
- 2. Assign addresses in memory to hold each of the variables being used
- by the program.
-
- 3. Remember the addresses in the generated code where each line number
- or label occurs, for GOTO and GOSUB statements.
-
- 4. Generate additional code to test for events and detect errors when
- the /v, /w, or /d compile options are used.
-
- Figure 1-1: The primary actions performed by a BASIC compiler.
-
-
- As the compiler processes a program's source code, it translates only the
- most basic statements directly into assembly language. For other, more
- complex statements, it instead generates calls to routines in the BASIC
- run-time library that is supplied with your compiler. When designing a
- BASIC program you would most likely identify operations that need to be
- performed more than once, and then create subprograms or functions rather
- than add the same code in-line repeatedly. Likewise, the compiler takes
- advantage of the inherent efficiency of using called subroutines.
- For example, when you use a BASIC statement such as PRINT Work$, the
- compiler processes it as if you had used CALL PRINT(Work$). That is, PRINT
- really is a called subroutine. Similarly, when you write OPEN FileName$
- FOR RANDOM AS #1 LEN = 1024, the compiler treats that as a call to its Open
- routine, and it creates code identical to CALL OPEN(FileName$, 1, 1024, 4).
- Here, the first argument is the file name, the second is the file number
- you specified, the third is the record length, and the value 4 is BASIC's
- internal code for RANDOM. Because these are BASIC key words, the CALL
- statement is of course not required. But the end result is identical.
- While the BC compiler could certainly create code to print the string
- or open the file directly, that would be much less efficient than using
- subroutines. Indeed, all of the subroutines in the Microsoft-supplied
- libraries are written in assembly language for the smallest size and
- highest possible performance.
-
-
- DATA STORAGE
-
- The second important job the compiler must perform is to identify all of
- the variables and other data your program is using, and allocate space for
- them in the object file. There are two kinds of data that are manipulated
- in a BASIC program--static data and dynamic data. The term static data
- refers to any variable whose address and size does not change during the
- execution of a program. That is, all simple numeric and TYPE variables,
- and static numeric and TYPE arrays. String constants such as "Press a key
- to continue" and DATA items are also considered to be static data, since
- their contents never change.
- Dynamic data is that which changes in size or location when the program
- runs. One example of dynamic data is a dynamic array, because space to
- hold its contents is allocated when the program runs. Another is string
- data, which is constantly moved around in memory as new strings are
- assigned and old ones are erased. Variable and array storage is discussed
- in depth in Chapter 2, so I won't belabor that now. The goal here is
- simply to introduce the concept of variable storage. The important point
- is that BC deals only with static data, because that must be placed into
- the object file.
- As the compiler processes your source code, it must remember each
- variable that is encountered, and allocate space in the object file to hold
- it. Further, all of this data must be able to fit into a single 64K
- segment, which is called DGROUP (for Data Group). Although the compiled
- code in each object file may be as large as 64K, static data is combined
- from all of the files in a multi-module program, and may not exceed 64K in
- total size. Note that this limitation is inherent in the design of the
- Intel microprocessors, and has nothing to do with BC, LINK, or DOS.
- As each new variable is encountered, room to hold it is placed into the
- next available data address in the object file. (In truth, the compiler
- retains all variable information in memory, and writes it to the end of the
- file all at once following the generated code.) For each integer variable,
- two bytes are set aside. Long integer and single precision variables
- require four bytes each, while double precision variables occupy eight
- bytes. Fixed-length string and TYPE variables use a varying number of
- bytes, depending on the components you have defined.
- Static numeric and TYPE arrays are also written to the object file by
- the compiler. The number of bytes that are written of course depends on
- how many elements have been specified in the DIM statement. Also, notice
- that no matter what type of variable or array is encountered, only zeroes
- are written to the file. The only exceptions are quoted string constants
- and DATA items, in which case the actual text must be stored.
- Unlike numeric, TYPE, and fixed-length variables, strings must be
- handled somewhat differently. For each string variable a program uses, a
- four-byte table called a *string descriptor* is placed into the object
- file. However, since the actual string data is not assigned until the
- program is run, space for that data need not be handled by the compiler.
- With string arrays--whether static or dynamic--a table of four-byte
- descriptors is allocated.
- Finally, each array in the program also requires an array descriptor.
- This is simply a table that shows where the array's data is located in
- memory, how many elements it currently holds, the length in bytes of each
- element, and so forth.
-
-
- ASSEMBLY LANGUAGE CONSIDERATIONS
-
- In order to fully appreciate how the translation process operates, you will
- first need to understand what assembly language is all about. Please
- understand that there is nothing inherently difficult about assembly
- language. Like BASIC, assembly language is comprised of individual
- instructions that are executed in sequence. However, each of these
- instructions does much less than a typical BASIC statement. Therefore,
- many more steps are required to achieve a given result than in a high-level
- language. Some of these steps will be shown in the following examples.
- If you are not comfortable with the idea of tackling assembly language
- concepts just yet, please feel free to come back to this section at a later
- time.
- Let's begin by examining some very simple BASIC statements, and see how
- they are translated by the compiler. For simplicity, I will show only
- integer math operations. The 80x86 family of microprocessors can
- manipulate integer values directly, as opposed to single and double
- precision numbers which are much more complex. The short code fragment in
- Listing 1-1 shows some very simple BASIC instructions, along with the
- resulting compiled assembly code. In case you are interested,
- disassemblies such as those you are about to see are easy to create for
- yourself using the Microsoft CodeView utility. CodeView is included with
- the Macro Assembler as well as with BASIC PDS.
-
-
- A% = 12
- MOV WORD PTR [A%],12 ;move a 12 into the word variable A%
-
- X% = X% + 1
- INC WORD PTR [X%] ;add 1 to the word variable X%
-
- Y% = Y% + 100
- ADD WORD PTR [Y%],100 ;add 100 to the word variable Y%
-
- Z% = A% + B%
- MOV AX,WORD PTR [B%] ;move the contents of B% into AX
- ADD AX,WORD PTR [A%] ;add to that the value of A%
- MOV WORD PTR [Z%],AX ;move the result into Z%
-
- Listing 1-1: These short examples show the compiled results of some simple
- BASIC math operations.
-
-
- The first statement, A% = 12, is directly translated to its assembler
- equivalent. Here, the value 12 is *moved* into the word-sized address
- named A%. Although an integer is the smallest data type supported by
- BASIC, the microprocessor can in fact deal with variables as small as one
- byte. Therefore, the WORD PTR (word pointer) argument is needed to specify
- that A% is a full two-byte integer, rather than a single byte. Notice that
- in assembly language, brackets are used to specify the contents of a memory
- address. This is not unlike BASIC's PEEK() function, where parentheses are
- used for that purpose.
- In the second statement, X% = X% + 1, the compiler generates assembly
- language code to increment, or add 1 to, the word-sized variable in the
- location named X%. Since adding or subtracting a value of 1 is such a
- common operation in all programming languages, the designers of the 80x86
- included the INC (and complementary DEC) instruction to handle that.
- Y% = Y% + 100 is similarly translated, but in this case to assembler
- code that adds the value 100 to the word-sized variable at address Y%. As
- you can see, the simple BASIC statements shown thus far have a direct
- assembly language equivalent. Therefore, the code that BC creates is
- extremely efficient, and in fact could not be improved upon even by a human
- hand-coding those statements in assembly language.
- The last statement, Z% = A% + B%, is only slightly more complicated than
- the others. This is because separate steps are required to retrieve the
- contents of one memory location, before manipulating it and assigning the
- result to another location. Here, the value held in variable B% is moved
- into one of the processor's registers (AX). The value of variable A% is
- then added to AX, and finally the result is moved into Z%. There are about
- a dozen registers within the CPU, and you can think of them as special
- variables that can be accessed very quickly.
- The next example in Listing 1-2 shows how BASIC passes arguments to its
- internal routines, in this case PRINT and OPEN. Whenever a variable is
- passed to a routine, what is actually sent is the address (memory location)
- of the variable. This way, the routine can go to that address, and read
- the value that is stored there. As in Listing 1-1, the BASIC source code
- is shown along with the resultant compiler-generated assembler
- instructions.
- It may also be worth mentioning that the order in which the arguments
- are sent to these routines is determined by how the routines are designed.
- In BASIC, if a SUB is designed to accept, say, three parameters in a
- certain order, then the caller must pass its arguments in that same order.
- Parameters in assembler routines are handled in exactly the same manner.
- Of course, any arbitrary order could be used, and what's important is
- simply that they match.
-
-
- PRINT Work$
- MOV AX,OFFSET Work$ ;move the address of Work$ into AX
- PUSH AX ;push that onto the CPU stack
- CALL B$PESD ;call the string printing routine
-
- OPEN FileName$ FOR OUTPUT AS #1
- MOV AX,OFFSET FileName$ ;load the address of FileName$
- PUSH AX ;push that onto the stack
- MOV AX,1 ;load the specified file number
- PUSH AX ;and push that as well
- MOV AX,-1 ;-1 means that a LEN= was not given
- PUSH AX ;and push that
- MOV AX,2 ;2 is the internal code for OUTPUT
- PUSH AX ;pass that on too
- CALL B$OPEN ;finally, call the OPEN routine
-
- Listing 1-2: Many BASIC statements create assembler code that passes
- arguments to internal routines, as shown above.
-
-
- When you tell BASIC to print a string, it first loads the address of the
- string into AX, and then pushes that onto the stack. The stack is a
- special area in memory that all programs can access, and it is often used
- in compiled languages to hold the arguments being sent to subroutines. In
- this case, the OFFSET operator tells the CPU to obtain the address where
- the variable resides, as opposed to the current contents of the variable.
- Notice that the words offset, address, and memory location all mean the
- same thing. Also notice that calls in assembly language work exactly the
- same as calls in BASIC. When the called routine has finished, execution
- in the main program resumes with the next statement in sequence.
- Once the address for Work$ has been pushed, BASIC's B$PESD routine is
- called. Internally, one of the first things that B$PESD does is to
- retrieve the incoming address from the stack. This way it can locate the
- characters that are to be printed. B$PESD is responsible for printing
- strings, and other BASIC library routines are provided to print each type
- of data such as integers and single precision values.
- In case you are interested, PESD stands for Print End-of-line String
- Descriptor. Had a semicolon been used in the print statement--that is,
- PRINT Work$;--then B$PSSD would have been called instead (Print Semicolon
- String Descriptor). Likewise, printing a 4-byte long integer with a
- trailing comma as in PRINT Value&, would result in a call to B$PCI4 (Print
- Comma Integer 4), where the 4 indicates the integer's size in bytes.
- In the second example of Listing 1-2 the OPEN routine is set up and
- called in a similar fashion, except that four parameters are required
- instead of only one. Again, each parameter is pushed onto the stack in
- turn, followed by a call to the routine. Most of BASIC's internal routines
- begin with the characters "B$", to avoid a conflict with subroutines of
- your own. Since a dollar sign is illegal in a BASIC procedure name, there
- is no chance that you will inadvertently choose one of the same names that
- BASIC uses.
- As you can see, there is nothing mysterious or even difficult about
- assembly language, or the translations performed by the BASIC compiler.
- However, a sequence of many small steps is often needed to perform even
- simple calculations and assignments. We will discuss assembly language in
- much greater depth in Chapter 14, and my purpose here is merely to present
- the underlying concepts.
- Please note that variable names are not retained after a program has
- been compiled. Once BC has finished its job, all references to each
- variable name have been replaced with an equivalent memory addresses in the
- object file. Further, once LINK has joined the object files and linked
- them to the BASIC language libraries, the procedure names are lost as well.
- These issues will be explored in much greater detail in Chapter 14.
-
-
- COMPILER DIRECTIVES
-
- As you have seen, some code is translated by the compiler into the
- equivalent assembly language statements, while other code is instead
- converted to calls to the language routines in the BASIC libraries. Some
- statements, however, are not translated at all. Rather, they are known as
- *compiler directives* that merely provide information to the compiler as
- it works. Some examples of these non-executable BASIC statements include
- DEFINT, OPTION BASE, and REM, as well as the various "metacommands" such
- as '$INCLUDE and '$DYNAMIC. Some others are SHARED, BYVAL, DATA, DECLARE,
- CONST, and TYPE.
- For our purposes here, it is important to understand that DIM when used
- on a static array is also a non-executable statement. Because the size of
- the array is known when the program is compiled, BC can simply set aside
- memory in the object file to hold the array contents. Therefore, code does
- not need to be generated to actually create the array. Similarly, TYPE/END
- TYPE statements also merely define a given number of bytes that will
- ultimately end up in the program file when the TYPE variable is later
- dimensioned by your program.
-
-
- EVENT AND ERROR CHECKING
-
- The last compiler responsibility I will discuss here is the generation of
- additional code to test for events and debugging errors. This occurs
- whenever a program is compiled using the /d, /w, or /v command line
- switches. Although event trapping and debugging are entirely separate
- issues, they are handled in a similar manner. Let's start with event
- trapping.
- When the IBM PC was first introduced, the ability to handle interrupt-
- driven events distinguished it from its then-current Apple and Commodore
- counterparts. Interrupts can provide an enormous advantage over polling
- methods, since polling requires a program to check constantly for, say,
- keyboard or communications activity. With polling, a program must
- periodically examine the keyboard using INKEY$, to determine if a key was
- pressed. But when interrupts are used, the program can simply go about its
- business, confident that any keystrokes will be processed. Here's how that
- works:
- Each time a key is pressed on a PC, the keyboard generates a hardware
- interrupt that suspends whatever is currently happening and then calls a
- routine in the ROM BIOS. That routine in turn reads the character from the
- keyboard's output port, places it into the PC's keyboard buffer, and
- returns to the interrupted application. The next time a program looks for
- a keystroke, that key is already waiting to be read. For example, a
- program could begin writing a huge multi-megabyte disk file, and any
- keystrokes will still be handled even if the operator continues to type.
- Understand that hardware interrupts are made possible by a direct
- physical connection between the keyboard circuitry and the PC's
- microprocessor. The use of interrupts is a powerful concept, and one which
- is important to understand. Unfortunately, BASIC does not use interrupts
- in most cases, and this discussion is presented solely in the interest of
- completeness.
-
-
- Event Trapping
-
- BASIC provides a number of event handling statements that perhaps *could*
- be handled via interrupts, but aren't. When you use ON TIMER, for example,
- code is added to periodically call a central event handler to check if the
- number of seconds specified has elapsed. Because there are so many
- possible event traps that could be active at one time, it would be
- unreasonable to expect BASIC to set up separate interrupts to handle each
- possibility. In some situations, such as ON KEY, there is a corresponding
- interrupt. In this case, the keyboard interrupt. However, some events
- such as ON PLAY(Count), where a GOSUB is made whenever the PLAY buffer has
- fewer than Count characters remaining, have no corresponding physical
- interrupt. Therefore, polling for that condition is the only reasonable
- method.
- The example in Listing 1-3 shows what happens when you compile using the
- /v switch. Notice that the calls to B$EVCK (Event Check) are not part of
- the original source code. Rather, they show the additional code that BC
- places just before each program statement.
-
-
- DEFINT A-Z
- CALL B$EVCK 'this call is generated by BC
- ON TIMER(1) GOSUB HandleTime
- CALL B$EVCK 'this call is generated by BC
- TIMER ON
- CALL B$EVCK 'this call is generated by BC
- X = 10
- CALL B$EVCK 'this call is generated by BC
- Y = 100
- CALL B$EVCK 'this call is generated by BC
- END
-
- HandleTime:
- CALL B$EVCK 'this call is generated by BC
- BEEP
- CALL B$EVCK 'this call is generated by BC
- RETURN
-
- Listing 1-3: When the /v compiler switch is used, BC generates calls to a
- central event handler at each BASIC statement.
-
-
- At five bytes per call, you can see that using /v can quickly bloat a
- program to an unacceptable size. One alternative is to instead use /w.
- In fact, /w can be particularly attractive in those cases where event
- handling cannot be avoided, because it lets you specify where a call to
- B$EVCK is made: at each line label or line number in your source code. The
- only downside to using line numbers and labels is that additional working
- memory is needed by BC to remember the addresses in the code where those
- labels are placed. This is not usually a problem, though, unless the
- program is very large or every line is labeled.
- All of the various BASIC event handling commands are specified using the
- ON statement. It is important to understand, however, that ON GOTO and ON
- GOSUB do not involve events. That is, they are really just an alternate
- form of GOTO and GOSUB respectively, and thus do not require compiling with
- /w or /v.
-
-
- Error Trapping
-
- The last compiler option to consider here is the /d switch, because it too
- generates extra code that you might not otherwise be aware of. When a
- program is compiled with /d, two things are added. First, for every BASIC
- statement a call is made to a routine named B$LINA, which merely checks to
- see if Ctrl-Break has been pressed. Normally, a compiled BASIC program is
- immune to pressing the Ctrl-C and Ctrl-Break keys, except during an INPUT
- or LINE INPUT statement. Since much of the purpose of a debugging mode is
- to let you break out of an errant program gone berserk, the Ctrl-Break
- checking must be performed frequently. These checks are handled in much
- the same way as event trapping, by calling a special routine once for each
- line in your source code.
- Another important factor resulting from the use of /d is that all array
- references are handled through a special called routine which ensures that
- the element number specified is in fact legal. Many people don't realize
- this, but when a program is compiled without /d and an invalid element is
- given, BASIC will blindly write to the wrong memory locations. For
- example, if you use DIM Array%(1 TO 100) and then attempt to assign, say,
- element number 200, BASIC is glad to oblige. Of course, there *is* no
- element 200 in that case, and some other data will no doubt be overwritten
- in the process.
- To prevent these errors from going undetected, BC calls the B$HARY (Huge
- Array) routine to calculate the address based on the element number
- specified. If B$HARY determines that the array reference is out of bounds,
- it invokes an internal error handler and you receive the familiar
- "Subscript out of range" message. Normally, the compiler accesses array
- elements using as little code as possible, to achieve the highest possible
- performance. If a static array is dimensioned to 100 elements and you
- assign element 10, BC knows at the time it compiles your program the
- address at which that element resides. It can therefore access that
- element directly, just as if it were a non-array variable.
- Even when you use a variable to specify an array element such as
- Array%(X) = 12, the starting address of the array is known, and the value
- in X can be used to quickly calculate how far into the array that element
- is located. Therefore, the lack of bounds checking in programs that do not
- use /d is not a bug in BASIC. Rather, it is merely a trade-off to obtain
- very high performance. Indeed, one of the primary purposes of using /d is
- to let BC find mistakes in your programs during development, though at the
- cost of execution speed.
- The biggest complication from BASIC's point of view is when huge
- (greater than 64K) arrays are being manipulated. In fact, B$HARY is the
- very same routine that BC calls when you use the /ah switch to specify huge
- arrays (hence the name HARY). Since extra code is needed to set up and
- call B$HARY compared to the normal array access, using /ah also creates
- programs that are larger and slower than when it is not used. Further,
- because B$HARY is used by both /d and /ah, invalid element accesses will
- also be trapped when you compile using /ah.
-
-
- Overflow Errors
-
- The final result of using /d is that extra code is generated after certain
- math operations, to check for overflow errors that might otherwise go
- undetected. Overflow errors are those that result in a value too large for
- a given data type. For example,
- if you multiply two integers and the result exceeds 32767, that causes an
- overflow error. Similarly, an underflow error would be created by a
- calculation resulting a value that is too small.
- When a floating point math operation is performed, errors that result
- from overflow are detected by the routines that perform the calculation.
- When that happens there is no recourse other than halting your program with
- an appropriate message. Integer operations, however, are handled directly
- by 80x86 instructions. Further, an out of bounds result is not necessarily
- illegal to the CPU. Thus, programs compiled without the /d option can
- produce erroneous results, and without any indication that an error
- occurred.
- To prove this to yourself, compile and run the short program shown in
- Listing 1-4, but without using /d. Although the correct result should be
- 90000, the answer that is actually displayed is 24464. And you will notice
- that no error message is displayed!
- As with illegal array references, BC would rather optimize for speed, and
- give you the option of using /d as an aid for tracking down such errors as
- they occur. If you compile the program in Listing 1-4 with the /d option,
- then BASIC will report the error as expected.
- Since an overflow resulting from integer operations is not technically
- an error as far as the CPU is concerned, how, then, can BASIC trap for
- that? Although an error in the usual sense is not created, there is a
- special flag variable within the CPU that is set whenever such a condition
- occurs. Further, a little-used assembler instruction, INTO (Interrupt 4
- if Overflow), will generate software Interrupt 4 if that flag is set.
- Therefore, all BC has to do is create an Interrupt 4 handler, and then
- place an INTO instruction after every integer math operation in the
- compiled code. The interrupt handler will receive control and display an
- "Overflow" message whenever an INTO calls it. Since the INTO instruction
- is only one byte and is also very fast, using it this way results in very
- little size or performance degradation.
-
-
- X% = 30000
- Y% = X% * 10
- PRINT Y%
-
- Listing 1-4: This brief program illustrates how overflow errors are handled
- in BASIC.
-
-
- COMPILER OPTIMIZATION
-
- Designing a compiler for a language as complex as BASIC involves some very
- tricky programming indeed. Although it is one thing to translate a BASIC
- source file into a series of assembly language commands, it is another
- matter entirely to do it well! Consider that the compiler must be able to
- accept a BASIC statement such as X! = ABS(SQR((Y# + Z!) ^ VAL(Work$))), and
- reduce that to the individual steps necessary to arrive at the correct
- result.
- Many, many details must be accounted for and handled, not the least of
- which are syntax or other errors in the source code. Moreover, there are
- an infinite number of ways that a programmer can accomplish the same thing.
- Therefore, the compiler must be able to recognize many different
- programming patterns, and substitute efficient blocks of assembler code
- whenever it can. This is the role of an *optimizing compiler*.
- One important type of optimization is called *constant folding*. This
- means that as much math as possible is performed during compilation, rather
- than creating code to do that when the program runs. For example, if you
- have a statement such as X = 4 * Y * 3 BC can, and does, change that to X
- = Y * 12. After all, why multiply 3 times 4 later, when the answer can be
- determined now? This substitution is performed entirely by the BC
- compiler, without your knowing about it.
- Another important type of optimization is BASIC's ability to remember
- calculations it has already performed, and use the results again later if
- possible. BC is especially brilliant in this regard, and it can look ahead
- many lines in your source code for a repeated use of the same calculations.
- Listing 1-5 shows a short fragment of BASIC source code, along with the
- resultant assembler output.
-
-
- X% = 3 * Y% * 4
- MOV AX,12 ;move the value 12 into AX
- IMUL WORD PTR [Y%] ;Integer-Multiply that times Y%
- MOV WORD PTR [X%],AX ;assign the result in AX to X%
-
- A% = S% * 100
- MOV BX,AX ;save the result from above in BX
- MOV AX,100 ;then assign AX to 100
- IMUL WORD PTR [S%] ;now multiply AX times S%
- MOV WORD PTR [A%],AX ;and assign A% from the result
-
- Z% = Y% * 12
- MOV WORD PTR [Z%],BX ;assign Z% from the earlier result
-
- Listing 1-5: These short code fragments illustrate how adept BC is at
- reusing the result of earlier calculations already performed.
-
-
- As you can see in the first part of Listing 1-5, the value of 3 times 4 was
- resolved to 12 by the compiler. Code was then generated to multiply the
- 12 times Y%, and the result is in turn assigned to X%. This is similar to
- the compiled code examined earlier in Listing 1-1. Notice, however, that
- before the second multiplication of S% is performed, the result currently
- in AX is saved in the BX register. Although AX is destroyed by the
- subsequent multiplication of S% times 100, the result that was saved
- earlier in BX can be used to assign Z% later on. Also notice that even
- though 3 * 4 was used first, BC was smart enough to realize that this is
- the same as the 12 used later.
- While the compiler can actually look ahead in your source code as it
- works, such optimization will be thwarted by the presence of line numbers
- and labels, as well as IF blocks. Since a GOTO or GOSUB could jump to a
- labeled source line from anywhere in the program, there is no way for BC
- to be sure that earlier statements were executed in sequence. Likewise,
- the compiler has no way to know which path in an IF/ELSE block will be
- taken at run time, and thus cannot optimize across those statements.
-
-
- THE BASIC RUN-TIME LIBRARIES
-
- Microsoft compiled BASIC lets you create two fundamentally different types
- of programs. Those that are entirely self-contained in one .EXE file are
- compiled with the /o command line switch. In this case, the compiler
- creates translations such as those we have already discussed, and also
- generates calls to the BASIC language routines contained in the library
- files supplied by Microsoft. When your compiled program is subsequently
- linked, only those routines that are actually used will be added to your
- program.
- When /o is not used, a completely different method is employed. In this
- case, a special .EXE file that contains support for every BASIC statement
- is loaded along with the BASIC program when the program is run from the DOS
- command line. As you are about to see, there are advantages and
- disadvantages to each method. For the purpose of this discussion I will
- refer to stand-alone programs as BCOM programs, after the BCOMxx.LIB
- library name used in all versions of QuickBASIC. Programs that instead
- require the BRUNxx.LIB library to be present at run time will be called
- BRUN programs.
- Beginning with BASIC 7 PDS, the library naming conventions used by
- Microsoft have become more obscure. This is because PDS includes a number
- of variations for each method, depending on the type of "math package" that
- is specified when compiling and whether you are compiling a program to run
- under DOS or OS/2. These variations will be discussed fully in Chapter 6,
- when we examine all of the possible options that each compiler version has
- to offer. But for now, we will consider only the two basic methods--BCOM
- and BRUN. The primary differences between these two types of programs are
- shown in Figure 1-2.
-
-
- 1. BCOM programs require less memory, run faster, and do not require
- the presence of the BRUNxx.EXE file when the program is run.
-
- 2. BRUN programs occupy less disk space, and also allow subsequent
- chaining to other programs that can share the common library code which
- is already resident. Chained-to programs also load quickly because the
- BRUN library is already in memory.
-
- Figure 1-2: A comparison of the fundamental differences between BCOM and
- BRUN programs.
-
-
- Stand-alone BCOM programs are always larger than an equivalent BRUN program
- because the library code for PRINT, INSTR, and so forth is included in the
- final .EXE file. However, less *memory* will be required when the program
- runs, since only the code that is really needed is loaded into the PC.
- Likewise, a BRUN program will take less disk space, because it contains
- only the compiled code. The actual routines to handle each BASIC
- statements are stored in the BRUNxx.LIB library, and that library is loaded
- automatically when the main program is run from DOS.
- You might think that since a BRUN program is physically smaller on disk
- it will load faster, but this is not necessarily true. When you execute
- a BRUN program from the DOS command line, one of the first things it does
- is load the BRUN .EXE support file. Since this support file is fairly
- large, the overall load time will be much greater than the compiled BASIC
- program's file size would indicate. However, if the main program
- subsequently chains to another BASIC program, that program will load
- quickly because the BRUN file does not need to be loaded a second time.
- One other important difference between these two methods is the way that
- the BASIC language routines are accessed. When a BCOM program is compiled
- and linked, the necessary routines are called in the usual fashion. That
- is, the compiler generates code that calls the routines in the BCOM library
- directly. When the program is subsequently linked, the procedure names are
- translated by LINK into the equivalent memory addresses. That is, a call
- to PRINT is in effect translated from CALL B$PESD to CALL ####:####, where
- ####:#### is a segment and address.
- BRUN programs, on the other hand, instead use a system of interrupts to
- access the BASIC language routines. Since there is no way for LINK to know
- exactly where in memory the BRUNxx.EXE file will be ultimately loaded, the
- interrupt vector table located in low memory is used to hold the various
- routine addresses. Although many of these interrupt entries are used by
- the PC's system resources, many others are available. Again, I will defer
- a thorough treatment of call methods and interrupts until Chapter 14. But
- for now, suffice it to say that a direct call is slightly faster than an
- indirect call, where the address to be called must first be retrieved from
- a table.
- As an interesting aside, the routines in the BRUNxx.EXE file in fact
- modify the caller's code to perform a direct call, rather than an interrupt
- instruction. Therefore, the first time a given block of code is executed,
- it calls the run-time routines through an interrupt instruction.
- Thereafter, the address where the BRUN file has been loaded is known, and
- will be used the next time that same block of code is executed. In
- practice, however, this improves only code that lies within a FOR/NEXT,
- WHILE, or DO loop. Further, code that is executed only once will actually
- be much slower than in a BCOM program, because of the added self-
- modification (the program changes itself) instructions.
- Notice that when BC compiles your program, it places the name of the
- appropriate library into the object file. The name BC uses depends on
- which compiler options were given. This way you don't have to specify the
- correct name manually, and LINK can read that name and act accordingly.
- Although QuickBASIC provides only two libraries--one for BCOM programs and
- one for BRUN--BASIC PDS offers a number of additional options. Each of
- these options requires the program to be linked with a different library.
- That is, there are both BRUN and BCOM libraries for use with OS/2, for near
- and far strings, and for using IEEE or Microsoft's alternate math
- libraries. Yet another library is provided for 8087-only operation.
-
-
- GRANULARITY
-
- Until now, we have examined only the actions and methods used by the BC
- compiler. However, the process of creating an .EXE file that can be run
- from the DOS command line is not complete until the compiled object file
- has been linked to the BASIC libraries. I stated earlier that when a
- stand-alone program is created using the /o switch, only those routines in
- the BCOM library that are actually needed will be added to the program.
- Unfortunately, that is not entirely accurate. While it is true that LINK
- is very smart and will bring in only those routines that are actually
- called, there is one catch.
- Imagine that you have written a BASIC program which is comprised of two
- separate modules. In one file is the main program that contains only in-
- line code, and in the other are two BASIC subprograms. Even if the main
- program calls only one of those subprograms, both will be added when the
- program is linked. That is, LINK can resolve routines to the source file
- level only, but cannot extract a single routine from an object module which
- contains multiple routines. Since an .LIB library file is merely a
- collection of separate object modules, all of the routines that reside in
- a given module will be added to a program, even if only one has been
- accessed. This property is called *granularity*, and it determines how
- finely LINK can remove routines from a library.
- In the case of the libraries supplied with BASIC, the determining factor
- is which assembly language routines were combined with which other routines
- in the same source file by the programmers at Microsoft. In QuickBASIC
- 4.5, for example, when a program uses the CLS statement, the routines that
- handle COLOR, CSRLIN, POS(0), LOCATE, and the function form of SCREEN are
- also added. This is true even if none of those other statements have been
- used. Fortunately, Microsoft has done much to improve this situation in
- BASIC PDS, but there is still room for improvement. In BASIC PDS, CLS is
- stored in a separate file, however POS(0), CSRLIN, and SCREEN are still
- together, as are COLOR and LOCATE.
- Obviously, Microsoft has their reasons for doing what they do, and I
- won't attempt to second guess their expertise here. The BASIC language
- libraries are extremely complex and contain many routines. (The QuickBASIC
- 4.5 BCOM45.LIB file contains 1,485 separate assembler procedures.) With
- such an enormous number of assembly language source files to deal with, it
- no doubt makes a lot of sense to organize the related routines together.
- But it is worth mentioning that Crescent Software's P.D.Q. library can
- replace much of the functionality of the BCOM libraries, and with complete
- granularity. In fact, P.D.Q. can create working .EXE programs from BASIC
- source that are less than 800 bytes in size.
-
-
- SUMMARY
- =======
-
- In this chapter, you learned about the process of compiling, and the kinds
- of decisions a sophisticated compiler such as Microsoft BASIC must make.
- In some cases, the BASIC compiler performs a direct translation of your
- BASIC source code into assembly language, and in others it creates calls
- to existing routines in the BCOM libraries. Besides creating the actual
- assembler code, BASIC must also allocate space for all of the data used in
- a program.
- You also learned some basics about assembly language, which will be
- covered in more detail in Chapter 13. However, examples in upcoming
- chapters will also use brief assembly language examples to show the
- relative efficiency of different coding styles. In Chapter 2, you will
- learn how variables and other data are stored in memory.